home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Internet Info 1993
/
Internet Info CD-ROM (Walnut Creek) (1993).iso
/
inet
/
internet-drafts
/
draft-ietf-uri-url-00.txt
< prev
next >
Wrap
Text File
|
1993-04-29
|
45KB
|
997 lines
Uniform Resource Locators Tim Berners-Lee
INTERNET DRAFT CERN
IETF URL Working Group 30 March 1993
Expires: September 30, 1993
Uniform Resource Locators
Status of this memo
This document is an Internet Draft. Internet Drafts are
working documents of the Internet Engineering Task Force
(IETF), its Areas, and its Working Groups. Note that other
groups may also distribute working documents as Internet
Drafts.
Internet Drafts are working documents valid for a maximum of
six months. Internet Drafts may be updated, replaced, or
obsoleted by other documents at any time. It is not
appropriate to use Internet Drafts as reference material or to
cite them other than as a "working draft" or "work in
progress".
Distribution of this document is unlimited. Please send
comments to the author as timbl@info.cern.ch. or to the
discussion list ietf-url@merit.edu.
Abstract
Many protocols and systems for document search and retrieval
are currently in use, and many more protocols or refinements
of existing protocols are to be expected in a field whose
expansion is explosive.
These systems are aiming to achieve global search and
readership of documents across differing computing platforms,
and despite a plethora of protocols and data formats. As
protocols evolve, gateways can allow global access to remain
possible. As data formats evolve, format conversion programs
can preserve global access. There is one area, however, in
which it is impractical to make conversions, and that is in
the names used to identify objects. This is because names of
objects are passed on in so many ways, from the backs of
envelopes to hypertext objects, and may have a long life.
This paper discusses the requirements on a universal naming
syntax which can be used to refer to objects available using
existing protocols, and may be extended with technology. It
makes a recommendation for a generic syntax, and for specific
forms for "Uniform Resource Locators" (URLs) of objects
accessible using existing Internet protocols.
Uniform Resource Locators Berners-Lee
Terms
The objects on the network which are to be named include
typically objects which can be retrieved, and objects which
can be searched. There is a great variety of other objects
which may support other operations. We imply nothing about
the contents of objects in this document. Whereas human-
readable documents are currently the center of interest of the
field, we envisage all aspects discussed in this paper
applying to generalised objects when systems to handle them
become available. The "object" is the unit of reference and
need not correspond to any unit of storage. We refer to
objects which can be searched as "indexes". We emphasise that
this is the abstract view of the client, and these objects
need not correspond to physical files on computers. We refer
to the person who does the retrieval or searching as the user.
We use the terms "name" for a string of characters
describing a document, which allows it (and it alone) to be
found. The term "address" is reserved for an string which
specifies a more or less physical location. The term
"locator" refers to a URL as here defined.
Requirements
This section discusses requirements for URLs, as an
introduction of and background for the Recommendations
section.
Uses of names and addresses
A name allows a user, with the help of a "client" program,
to retrieve or operate on objects via a "server" program. A
name may be passed for example:
- In communication of any form between two people, to
refer to a document, or part of a document;
- As part of the description of a link associated with
a hypertext document;
- As part of the result of searching an index.
Some typical requirements on a name which are met to a
varying degree by various schemes are for example that the
name is
Persistent A given name will remain valid as long as it
is needed;
Extensible A given naming syntax will remain valid
through the introduction of new protocols and
directory technologies;
Resolvable A name will contain enough information to
allow the document or index to which it
refers to be accessed, perhaps via resolution
into an intermediate, more physical, name.
Internet Draft 2 March 1993
Uniform Resource Locators Berners-Lee
Unique The fact that two names are identical implies
that the objects named are the same in some
way
The syntax discussed is the syntax of one name, be it a
lasting name or a physical address. When a directory server
or hypertext link contains a set of alternative names, then
that is beyond the scope of this syntax. Similarly, a syntax
for describing a compound object is outside the scope of this
syntax. The specific locator name spaces (defined under the
umbrella of the general syntax) each meet the requirements
above to a greater or lesser extent.
Current practice
Current protocols use many different standards for names.
For some protocols, such as ISO-10163 Search and Retrieve
protocol[16], the names returned in a search are only valid
during the session. For others, such as FTP[9], they are
lasting names which may be used for object retrieval at a
later time. Typically, however, they are not long-lasting
names which are independent of the location of the object.
Such names may be provided using directory servers such as
x.500. They will refer to the registration, however formal or
informal, of a object with a particular organisation or
person. Both hypertext and manual references rely on long-
lasting names.
Current names are basically location specifiers (addresses).
These may be known as Uniform Resource Locators (URLs). They
give the necessary parts of an address for a reader to access
an information provider using the given protocol, and ask for
the object required. Examples of names used by various
protocols include
File Transfer Protocol (Postel 1985):
Host name or IP-address
[IP port]
[user name, password]
Filename
W.A.I.S. (Kahle 1990) Host name or IP-address
[IP port]
database name
local document id
Gopher (Alberti 1991) Host name or IP-address
[IP port]
database name
selector string
HTTP (Berners-Lee 1991) Host name or IP-address
[IP port]
local object id
Internet Draft 3 March 1993
Uniform Resource Locators Berners-Lee
NNTP (Kantor 1986) group Group name
NNTP article Host name
unique message identifier
x.500 distinguished name Country
Organisation
Organisational unit
Person
Local object identifier
Other systems with their own naming schemes include BITNET
"LISTSERV" application, FTAM file retrieval, SQLnetTM remote
database search, proprietary distributed file systems, etc.
Conventional syntax for writing these addresses involve
various forms of punctuation to separate these parts. This
sometimes, but not always, allows the naming scheme to be
deduced from the punctuation. For example, a name of the form
xxx.yyy.zz.edu:/pub.aa.bb.cc often implies anonymous FTP
access. However, there is no well-defined algorithm for
parsing an arbitrary name, as there is no common syntax.
Expandability
There will necessarily be a phase during which lasting names
will become more common, as the deployment of directory
services increases to the point where every user has direct
or indirect access to one. Even then, however, one can
envisage more than one competing directory system, and cases
in which physical names are still required. A directory
service takes a lasting name and reduces it to a physical
address (or set of addresses) which, though less useful for
lasting reference, is the only way to actually retrieve the
object.
An addressing syntax is required which will be able to
encompass existing physical address spaces, and be extendible
to any future protocols. This requires that it contain an
identifier for the protocol in use. The format of the rest of
the address will necessarily depend to a certain extent on the
protocol.
Relevance
The life of a name is limited by any information contained
within it which may become prematurely invalid. It is
therefore necessary to limit the contents of a name to the
Internet Draft 4 March 1993
Uniform Resource Locators Berners-Lee
information required for the operations above. Other
extraneous information about the object (its size, data
format, authorisation details, etc.) may in general change
with time and should not be part of the name.
One might expect such information to be part of the "header"
of a object, and for protocols to allow the header information
to be retrieved independently of the objects themselves.
Any physical address may be subject to change with time:
hence we encourage the move to lasting names and directory
services.
Uniqueness
Clearly one requires uniqueness in the sense that one name
should refer to only one logical object. This is the case with
all the addressing schemes in use, whether they are directory
systems or physical addresses. (The internet addresses all
rely on the domain name (Mockapetris 1987) of the host to
achieve this).
However, given that names can be translated, many apparently
different names may lead to the same object. Any object may
therefore be referred to by many names. One needs to be able
to know whether two objects, retrieved through different
paths, are in fact the same object.
It is suggested that each object have one "official" name.
This name could be stored in the object in some
representations, or stored in a database accessible to the
server, for example. Any references within that object
should be parsed in the context of the official name. In the
presence of a directory service, the official name will
normally be the registered name of the object. However, a name
in any scheme will do, so long as it is completely specified.
On systems which do not allow the name to be stored (such as
anonymous FTP archive sites), a possible ambiguity will always
exist as to whether two similarly named objects are in fact
the same.
Note that Internet newsgroup names are unique world-wide,
and news articles carry a unique message id.
In most other cases, however, there is no guarantee that
dereferencing a URL will work, or that if it does the object
it refers to will in fact be the object intended. URLs such
as FTP addresses are transient in that files may be moved and
even replaced by different files of the same name. This
disorganisation may be limited by good server management, but
a naming scheme which is independent also of internet host
name is obviously preferable.
Readability by people
This requirement has been put forward by several people
(Clifford Lynch, Douglas Engelbart among others), and disputed
by others. The author's view is that it will be a while
Internet Draft 5 March 1993
Uniform Resource Locators Berners-Lee
before technology and standardisation have reached the point
at which names and addresses will be hidden from human beings.
As long as they must be written on the backs of envelopes and
"cut and pasted" between workstation windows, there is a
strong need for names to be
. Short
. Composed of printable (preferably non-white)
characters
. To a certain extent, understadable by a human being.
Structure of names and addresses.
A physical address is required in order for
. The user's program to contact the server
. The server to search and index, retrieve a object,
or look up the name;
. The user's program to locate an individual position
or element within a object.
This suggests that a name be structured, such that the parts
necessary for these three operations be separate and only
used by those system elements which need those parts. This
corresponds to the basic principle of information hiding. In
fact, four parts are necessary, including the indicator of
the naming scheme to be used:
. The naming scheme: a registered identifier for the
protocol.
. The name of a suitable server. The format of this
part must be well defined. It will depend on the
lower-layer protocols in use. Systems which use
widely distributed information, such as x.500 and
NNTP, do not need this part as each client generally
contacts his nearest server (or a particular
server).
. Information to be passed to the server. This may be
private to the server, as all names may be generated
and used by the same server. The client should
normally be transparent to this part of the name.
. Information to be used by the application once the
object has been retrieved. This part is private to
the application (or, more strictly, the data format)
and so cannot be defined here.
Both lasting names and physical addresses often share a
hierarchical structure. This follows often from the
organisation of the system. From the naming point of view, it
has the advantage that a reference in one object to another
Internet Draft 6 March 1993
Uniform Resource Locators Berners-Lee
object need not include that part of the structure which is
common to both names.
Choices
The requirements above leave little room for choice save for
the order and punctuation of the elements of an address. It
is only reasonable for the order of writing of the parts to be
consistently from left to right (or right to left) with
increasing specificity. Punctuation schemes fall into two
categories (Huitema 1991): tagged schemes in which field are
given names, and fields which use special characters and field
order. The latter tend to be more compact schemes.
protocol: aftp host: xxx.yyy.edu path:
/pub/doc/README
PR=aftp; H=xx.yy.edu; PA=/pub/doc/README;
PR:aftp/xx.yy.edu/pub/doc/README
/aftp/xx.yy.edu/pub/doc/README
Fig 1. Some alternative tagged and untagged representations
The choice of special symbols for punctuation tends to be a
matter of taste. It is easier to read addresses whose symbols
correspond to those of one's favourite operating system. A
variety of symbols is needed so that when a name is
abbreviated it is possible to tell which parts have been
omitted. The recommendation below uses special characters in
order to achieve a compact name, and uses where possible
punctuation symbols established in the internet or unix
community.
The choice of escape character for introducing
representations of non-allowed characters also tends to be a
matter of taste. An ANSI standard exists in the C language,
using the back-slash character "\". The use of this character
on unix command lines, however, can be a problem as it is
interpreted by many shell programs, and would have itself to
be escaped.
The use of white space characters has been avoided in URLs:
spaces are not legal characters. This was done because of
the frequent introduction of extraneous white space when lines
are wrapped by systems such as mail, or sheer necessity of
narrow column width, and because of the inter-conversion of
various forms of white space which occurs during character
code conversion and the transfer of text between applications.
Internet Draft 7 March 1993
Uniform Resource Locators Berners-Lee
Recommendations
This section describes the syntax for "Uniform Resource
Locators" (URLs): that is, basically physical addresses of
objects which are retrievable using protocols already deployed
on the net. The generic syntax provides a framework for new
schemes for names to be resolved using as yet undefined
protocols.
The syntax is described in two parts. Firstly, the syntax
rules of a completely specified name are given; secondly, the
rules under which parts of the name may be omitted in a well-
defined context.
Full form
A complete URL consists of a naming scheme specifier
followed by a string whose format is a function of the naming
scheme. For locators of information on the internet, a common
syntax is used for the IP address part. A BNF description of
the URL syntax is given in an a later section. The components
are as follows.
Anchor-id
This represents a part of, fragment of, or a sub-function
within, an object or object. Its syntax and semantics are
defined by the application responsible for the object, or the
specification of the content type of the object. The only
definition here is of the allowed characters by which it may
be represented in a URL.
The anchor-id follows the URL of the whole object from which
it is separated by a hash sign (#). If the anchor-id is void,
the hash sign may be omitted: A void anchor-id with or without
the hash sign means that the URL refers to the whole object.
While this hook is allowed for identification of fragments,
the question of addressing of parts of objects, or of the
grouping of objects and relationship between contined and
containing objects, is not addressed by this object.
This object does not address the question of objects which
are different versions of a "living" object, nor of expressing
the relationships between different versions and the living
object.
Scheme
Within the URL of a object, the first element is the name of
the scheme, separated from the rest of the object by a colon.
The rest of the URL follows the colon in a format depending on
the scheme.
Internet Draft 8 March 1993
Uniform Resource Locators Berners-Lee
Internet protocol parts
Those schemes which refer to internet protocols have a
common syntax for the rest of the object name. This starts
with a double slash "//" to indicate its presence, and
continues until the following slash "/". Within that section
are
. An optional user name, if this must be quoted to the
server, followed by a commercial at sign "@". (Use
of this field is discouraged. Provision of encoding
a password after the user name, delimited by a
colon, could be made but obviously is only useful
when the password is public, in which case it
should not be necessary, so that is also
discouraged.)
. The internet domain name of the host in RFC1037
format (or, optionally and less advisably, the IP
address as a set of four decimal digits)
. The port number, if it is not the default number for
the protocol, is given in decimal notation after a
colon.
Path
The rest of the locator is known as the "path". It may
define details of how the client should communicate with the
server, including information to be passed transparently to
the server without any processing by the client.
The path is interpreted in a manner dependent on the
protocol being used. However, when it contains slashes, these
must imply a hierarchical structure.
Partial form
In a certain limited set of cases, generally within a
certain application, it may be useful to pass only a section
of the URL. Within a object whose URL is well defined, the URL
of another object may be given in abbreviated form, where
parts of the two URLs are the same. This allows objects within
a group to refer to each other without requiring the space for
a complete reference, and it incidentally allows the group of
objects to be moved without changing any references. This is
not discussed in detail here, it is only mentioned so that the
characters required by the technique be reserved for that
purpose. It must be emphasised that when a reference is
passed in anything other than a well controlled context, the
full form must always be used.
Internet Draft 9 March 1993
Uniform Resource Locators Berners-Lee
The partial form relies on a property of the URL syntax that
certain characters ("/") and certain path elements ("..", ".")
have a significance reserved for representing a hierarchical
space, and must be recognised as such by both clients and
servers.
A partial form can be distinguished from a full form in that
a full form must have a colon and that colon must occur before
any slash characters.
The rules for the use of a partial name are:
. If the scheme parts are different, the whole
absolute locator must be given. Otherwise, the
scheme is omitted, and:
. If the host and/or port parts are the different, the
host, port name and all the rest of the locator must
be given.
. If the access and host parts are the same, then the
path may be given in absolute (fully qualified) or
relative form. Within the path:
. If a leading slash is present, the path is absolute.
Otherwise, a relative path is interpreted as
follows:
. The last part of the path of the context locator
(anything following the rightmost slash) is removed,
and the given partial URL appended in its place.
. Within the result, all occurrences of "/xxx/.." or
"/." are recursively removed, where xxx, ".." and
"." are complete path elements.
Mapping Local Names
When a system uses a local addressing scheme, it is useful
to provide a mapping from local addresses into URLs so that
references to objects within the addressing scheme may be
referred to globally, and possibly accessed through gateway
servers.
Any mapping scheme may be defined provided it is
unambiguous, reversible, and provides valid URLs. It is
recommended that where hierarchical aspects to the local
naming scheme exist, they be mapped onto the hierarchical URL
path syntax in order to allow the partial form to be used.
The following escaping method is used for mapping WAIS, FTP
and Gopher addresses onto URLs. Where the local naming scheme
uses ASCII characters which are not allowed in the URL, these
may be represented in the URL by a percent sign "%" followed
by two hexadecimal digits (0-9, A-F) giving the ASCII value
for that character. If non-ASCII characters are used, then a
similar escaping system should be used. Character codes other
than those allowed by the syntax shall not be used in a URL.
The same considerations apply to mapping local anchor
identifiers onto the anchorid part of a URL.
Internet Draft 10 March 1993
Uniform Resource Locators Berners-Lee
Specific Naming Schemes
The mapping for some existing standard and experimental
protocols is outlined in the BNF syntax definition. Notes on
particular protocols follow.
HTTP
The HTTP protocol specifies that the path is handled
transparently by those who handle URLs, except for the servers
which dereference them. The path is passed by the client to
the server with any request, but is not otherwise understood
by the client. The anchorid part is not sent with the
request. The search part, if present, is sent.
FTP
The ftp: prefix indicates a file which is to be picked up
from the file system of the given host. The FTP protocol is
used. The port number if given gives the port of the FTP
server if not the FTP default. (A client may in practice use
local file access to retrieve objects which are available
though more efficient means such as local file open or NFS
mounting, where this is available and equivalent)
The syntax allows for the inclusion of a user name and even
a password for those systems which do not use the anonymous
FTP convention. The default, however, if no user or password
is supplied, will be to use that convention, viz. that the
user name is "anonymous" and the password the user's mail
address.
The adoption of a unix-style syntax involves the conversion
into non-unix local forms by either the client or server. Some
non-unix servers do this, but clients wishing to access sites
which do not have unix-style naming will need certain
algorithms to enable other file systems to be identified and
treated. Client software may also have to be flexible in
terms of the sequence of FTP commands used with different
varieties of server. In view of a tendency for file systems
to look increasingly similar, it was felt that the URL
convention should not be weighed down by extra mechanisms for
identifying these cases.
The data format of a file can only, in the general FTP case,
be deduced from the name, normally the suffix of the name.
This is not standardised. The transfer mode (binary or text)
must in turn be deduced from the data format. It is
recommended that conventions for suffixes of public archives
be established, but it outside the scope of this paper.
Andrew File System
The afs: prefix indicates a following afs cellname and path.
Internet Draft 11 March 1993
Uniform Resource Locators Berners-Lee
News
The news locators refer to either news group names or
article message identifiers which must conform to the rules of
RFC 850. A message identifier may be distinguished from a
news group name by the presence of the commercial at "@"
character. These rules imply that within an article, a
reference to a news group or to another article will be a
valid URL (in the partial form).
Note: An outstanding problem is that the message identifier
is insufficient to allow the retrieval of an expired article,
as no algorithm exists for deriving an archive site and
filename. The addition of the date and news group set to the
article's URL would allow this if a directory existed of
archive sites by news group. Suggested subject of study in
conjunction with NNTP WG. Further extension possible may be
to allow the naming of subject threads as addressable objects.
WAIS
The current WAIS implementation public domain requires that
a client know the "type" and length of a object prior to
retrieval. These values are returned along with the internal
object identifier in the search response. They have been
encoded into the path part of the URL in order to make the URL
sufficient for the retrieval of the object. If changes to
WAIS specs make the internal id something which is sufficient
for later retrieval then this will not be necessary.
Within the WAIS world, names do not of course not need to be
prefixed by "wais:" (by the partial form rules).
Prospero
The Prospero (Neuman, 1991) UDP-based virtual file system
protocol is used. The host and port parts are used, and
optional. The significance of the path part may be the name
of a file, or anything else according to the server. If the
path ends with a final slash "/" that indicates to the client
that the object is a directory to be listed.. Prospero links
of the form EXTERNAL are converted into URLs of non-prospero
naming schemes (such as "ftp:").
The path may, as well as the Prospero "Host Specific Object
Name" (HSOName) have other following Prospero fields,
currently version and URN. These are appended to the HSOName
and separated by the characters "%23" (percent, two, three),
this being the escaped form of the Prospero hash sign
delimiter.
Internet Draft 12 March 1993
Uniform Resource Locators Berners-Lee
Gopher
The first character of the URL path part (after the initial
single slash) is a single-character "type" field which is that
used by the Gopher protocol. The rest of the path is the
"selector string", with unprintable characters and spaces
encoded.
Gopher links which refer to different protocols may be
converted into URLs for those protocols.
Telnet, rlogin, tn3270
The use of URLs to represent interactive sessions is a
convenient extension to their uses for objects. This allows
access to information systems which only provide an
interactive service, and no information server. As
information within the service cannot be addressed
individually or, in general, automatically retrieved, this is
a less desirable, though currently common, solution.
x500
The mapping of x500 names onto URLs is not defined here. A
decision is required as to whether "distinguished names" or
"user friendly names" (ufn), or both, should be allowed. If
any punctuation conversions are needed from the adopted x500
representation (such as the use of slashes between parts of a
ufn) they must be defined. This is a subject for study.
WHOIS
This prefix describes the access using the "whois++" scheme
in the process of definition. The hostname part is the same as
for other IP based schemes. The path part can be either a
whois handle for a whosi object, or it can be a valid whois
query string. This is a subject for further study.
Network Management Database
This is a subject for study.
Registration of naming schemes
A new naming scheme may be introduced by defining a mapping
onto a conforming URL syntax, using a new scheme identifier.
Experimental scheme identifiers may be used by mutual
agreement between parties, and must start with the characters
"x-".
It is proposed that the Internet Assigned Numbers Authority
(IANA) perform the function of registration of new schemes.
Internet Draft 13 March 1993
Uniform Resource Locators Berners-Lee
Any submission of a new scheme must include a definition of an
algorithm for the retrieval of any object within that scheme.
The algorithm must take the URL and produce either a set of
URL(s) which will lead to the desired object, or the object
itself, in a well-defined or determinable format. It is
recommended that those proposing a new scheme demonstrate its
utility and operability by the provision of a gateway which
will provide images of objects in the new scheme for clients
using an existing protocol.
It is likewise recommended that, where a protocol allows for
retrieval by URL, that the client software have provision for
being configured to use specific gateway locators for indirect
access through new naming schemes.
BNF syntax
This is a BNF-like description of the Uniform Resource
Locator syntax. A vertical line "|" indicates alternatives,
and [brackets] indicate optional parts. Spaces are
representational only: no spaces are actually allowed within a
URL. Single letters stand for single letters. All words of
more than one letter below are entities described somewhere in
this description.
anchoraddress docaddress [ # anchorid ]
docaddress generic | httpaddress | fileaddress |
newsaddress | prosperoaddress | telnetaddress
| gopheraddress | waisaddress | afsaddress
generic scheme : path [ ? search ]
scheme ialpha
httpaddress h t t p : / / hostport [ / path ] [ ?
search ]
fileaddress f t p : / / host / path
afsaddress a f s : / / cellname / path
newsaddress n e w s : groupart
waisaddress waisindex | waisdoc
waisindex w a i s : / / hostport / database [ ? search
]
waisdoc w a i s : / / hostport / database / wtype /
digits / path
groupart * | group | article
group ialpha [ . group ]
article xalphas @ host
database xalphas
wtype xalphas
prosperoaddress p r o s p e r o : / / path
telnetaddress t e l n e t : / / [ user @ ] hostport
gopheraddress g o p h e r : / / hostport [/ gtype [ /
selector ] ] [ ? search ]
hostport host [ : port ]
host hostname | hostnumber
cellname hostname
Internet Draft 14 March 1993
Uniform Resource Locators Berners-Lee
hostname ialpha [ . hostname ]
hostnumber digits . digits . digits . digits
port digits
selector path
path void | xpalphas [ / path ]
search xalphas [ + search ]
user xalphas
anchorid xalphas
gtype xalpha
xalpha alpha | $ | - | _ | @ | ! | % | ^ | & | * |
( | ) | . | digit
xalphas xalpha [ xalphas ]
xpalpha xalpha | +
xpalphas xpalpha [ xpalpha ]
ialpha alpha [ xalphas ]
alpha a | b | c | d | e | f | g | h | i | j | k | l
| m | n | o | p | q | r | s | t | u | v | w
| x | y | z | A | B | C | D | E | F | G | H
| I | J | K | L | M | N | O | P | Q | R | S
| T | U | V | W | X | Y | Z
digit 0 |1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
digits digit [ digits ]
alphanum alpha | digit
alphanums alphanum [ alphanums ]
void
Security considerations
The URL scheme does not in itself pose a security threat.
Users should beware that there is no general guarantee that a
URL which at one time points to a given object continues to do
so, and does not even at some later time point to a different
object due to the movement of objects on servers.
Conclusion
A need has been demonstrated, and a number of requirements
have been stated for uniform resource locators (URLs). A
scheme has been proposed which builds on existing conventions
to define a syntax for URLs. Adoption of the scheme in
correspondence, standards and software will ease the use of
references to on-line information in a flexible way as the
coming information age arrives.
Acknowledgements
This paper builds on much discussion of these issues by many
people on the network. The discussion was particularly
stimulated by articles by Clifford Lynch (1991), Brewster
Kahle (1991) and Wengyik Yeong (1991b). Contributions from
Internet Draft 15 March 1993
Uniform Resource Locators Berners-Lee
John Curran (NEARnet), Clifford Neuman (ISI) Ed Vielmetti
(MSEN) and later the IETF URL working group have been
incorporated into this issue of this paper.
The draft url4 was generated from url3 following discussion
and overall approval of the URL working group on 29 March
1993. The paper url3 had been generated from udi2 in the light
of discussion at the UDI BOF meeting at the Boston IETF in
July 1992.
References
Alberti, R., et.al. (1991) "Notes on the Internet Gopher
Protocol" University of Minnesota, December 1991,
URL=file://boombox.micro.umn.edu/pub/gopher/gopher_protocol
. See also
URL=gopher://gopher.micro.umn.edu:70/00/Information%20About
%20Gopher/About%20Gopher
Berners-Lee, T., (1991) "HTTP as implemented in WWW", CERN,
December 1991,
URL=file://info.cern.ch./pub/www/doc/http.txt
Davis, F, et al., (1990) "WAIS Interface Protocol: Prototype
Functional Specification", Thinking Machines Corporation,
April 23, 1990
URL=file://quake.think.com/pub/wais/doc/protspec.txt
International Standards Organization, (1991) Information and
Documentation - Search and Retrieve Application Protocol
Specification for open Systems Interconnection, ISO-10163
Huitema, C., (1991) "Naming: strategies and techniques",
Computer Networks and ISDN Systems 23 (1991) 107-110.
Kahle, Brewster, (1991) "Document Identifiers, or
International Standard Book Numbers for the Electronic
Age", URL=file://quake.think.com/pub/wais/doc/doc-ids.txt
Kantor, B., and Lapsley, P., (1986) "A proposed standard for
the stream-based transmission of news", Internet RFC-977,
February 1986. URL=file://nnsc.nsf.net/rfc/rfc977.txt
Lynch, C., Coallition for Networked Information: (1991)
"Workshop on ID and Reference Structures for Networked
Information", November 1991. See
URL=wais://quake.think.com/wais-discussion-archives?lynch
Mockapetris, P., (1987) "Domain names + concepts and
facilities", RFC-1034, USC-ISI, November 1987,
URL=file://nnsc.nsf.net/rfc/rfc1034.txt
Neuman, B. Clifford, (1992) "Prospero: A Tool for Organizing
Internet Resources", Electronic Networking: Research,
Applications and Policy, Vol 1 No 2, Meckler Westport CT
USA. See also
URL=file://prospero.isi.edu/pub/prospero/oir.ps
Postel, J. and Reynolds, J. (1985) "File Transfer Protocol
(FTP)", Internet RFC-959, October 1985.
URL=file://nnsc.nsf.net/rfc/rfc959.txt
Yeong, W., (1991a) "Towards Networked Information Retrieval",
Technical report 91-06-25-01, June 1991, Performance
Internet Draft 16 March 1993
Uniform Resource Locators Berners-Lee
Systems International, Inc.
URL=file://uu.psi.com/wp/nir.txt
Yeong, W., (1991b), "Representing Public Archives in the
Directory", Internet Draft, November 1991. In
URL=wais://nnsc.nsf.net/internet-drafts?yeong
Author's address
Tim Berners-Lee
World-Wide Web project
CERN, 1211 Geneva 23, Switzerland
+41 (22)767 3755
timbl@info.cern.ch
Internet Draft 17 March 1993